XML Lossy Text Compression: A Preliminary Study
نویسندگان
چکیده
Lossy compression techniques have been applied to image and text compression, yielding compression factors that are vastly superior to lossless compression schemes. In this paper, we present a preliminary study on a set of lossy transformations for XML documents that preserve the semantics. Inspired by previous techniques, e.g. lossy text compression and literate programming, we apply a simple algorithm to XML syntactic constructs to loose superfluous layout information and redundant text. The obtained XML keeps the human-readability and machine-readability properties. Additionally, it can lead to a considerable reduction of its space occupancy and boost the application of conventional text compressors, thus representing a promising technology for several data management tasks.
منابع مشابه
Toward Remote Object Coherence with Compiled Object Serialization for Distributed Computing with XML Web Services
Cross-platform object-level coherence in Web services-based distributed systems and grids requires lossless serialization to ensure programming-language specific objects are safely transmitted, manipulated, and stored. However, Web services development tools often suffer from lossy forms of XML serialization, which diminishes the usefulness of XML Web services as a competitive approach to binar...
متن کاملSemantic Lossy Compression of XML Data
In the last years a large amount of semistructured data [1, 10] has been managed and exchanged. The largest repository of semistructured data is the World Wide Web, which can be thought of as an enormous database in which data is highly heterogeneous and freely correlated. In this scenario is placed Extensible Markup Language (XML) [14], a language for semistructured data standardised by the Wo...
متن کاملCompact XML grammar based compression
Extensible Markup Language (XML) is the standard format for content representation and sharing on the Web. XML is a highly verbose language, especially regarding the duplication of meta-data in the form of elements and attributes. As XML content is becoming more widespread so is the demand to compress XML data volume. This paper presents a new grammar, called D-grammar, which defines XML struct...
متن کاملXML Structure Compression
XML is becoming the universal language for communicating information on the Web and has gained wide acceptance through its standardisation. As such XML plays an important enabling role for dynamic computation over the Web. Compression of XML documents is crucial in this process as, in its raw form, it often contains a sizable amount of redundancy. Several XML compression algorithms have been pr...
متن کاملSqueezex: Synthesis and Compression of XML Data
XML is emerging as the “universal” language for semistructured data description/exchange, and new issues regarding the management of XML data, both in terms of performance and usability, are becoming critical. The application of knowledge-based synthesization and compression methods (i.e. derivation of synthetic views and lossless/lossy approximation of contents) can be extremely beneficial in ...
متن کامل